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1. INTRODUCTION 

Academic quality is one of the most important parts of the existence and sustainability of a university. 
To improve the quality of academic quality assurance, universities must be able to manage data effectively 
and find hidden knowledge from the data to support management decision making. Several universities have 
used data mining as part of education quality assurance with the aim of discovering knowledge and providing 
timely data for academic decision making [1], [2]. Data mining, commonly known as knowledge discovery in 
database (KDD) is an activity related to data collection and the use of historical data to find knowledge and 
information in big data [3]. In data mining, visualization is one of the easiest ways to understand 
multidimensional data structures and data analysis [4]. One of the clustering and visualization methods that is 
often used is self-organizing map (SOM) which maps high-dimensional data to low-dimensional space while 
maintaining the topological structure of the data [5]. 

The problem that arises is that conventional clustering has not been able to automatically conclude the 
position of higher education’s academic performance compared to others. Likewise, the process of analyzing 
clustering results in conventional methods usually focuses on the quality of clustering results such as the 
entropy or F-measure methods [6]—[8]. However, clustering results do not automatically inform a university’s 
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academic performance. Meanwhile, a quality assurance system is needed that is able to provide information 
on academic performance quickly and accurately. Therefore, advanced techniques are needed that are able to 
summarize the results of academic clustering automatically, quickly and accurately. 

This article proposes a new technique, namely self organizing map and similarity to an ideal 
solution (SOM-SIS) which can summarize the results of clustering through the SOM technique and the 
technique for order preference by similarity to an ideal solution (TOPSIS) automatically. TOPSIS was 
developed by Yoon and Hwang [9]. This technique uses the basic concept that the ideal solution chosen must 
have the shortest distance from the positive ideal solution, and the farthest from the negative ideal solution. 

SOM-GSIS is applied to three academic parameters that represent university output, namely student 
achievement [10]-[12], study period [13], [14], and drop out rate [15], [16]. Student learning achievement is 
measured by the grade point average (GPA) [17], [18], the study period is the length of student learning [19]-[21], 
while the drop out rate is the student’s failure rate [22]. The basis of the 3 academic-parameters refers to the 
higher education accreditation instrument (HEAT) in Indonesia [23]. 

The auto-summarizing SOM-SIS method was tested using a dataset of 300 taken from universities 
in Indonesia. As a result, the position of universities compared to others can be known automatically and 
accurately from the SOM results. These results were validated by several quality assurance experts in 
universities with 100% accuracy. 


2. RESEARCH METHODS 

The development stage of the SOM-SIS auto-summarizing academic quality assurance system is 
described in Figure 1. The research stages are divided into two important parts, namely data filtering and 
mining, as well as clustering and auto-summarizing. The filtering and mining stage begins with cleaning, 
integrating, selecting, transforming into the desired form [24], and mining data using the SOM algorithm. The next 
stage consists of processing the SOM results using the TOPSIS algorithm to generate auto-summarizing SOM-SIS. 


PRE-PROCESS PROCESS 


Figure 1. Research block diagram 


Data cleaning is the process of removing noise and inconsistent or irrelevant data. Data cleaning is 
carried out if the data obtained from the university database contains imperfect entries such as missing, 
invalid, or typographical data. The irrelevant data is then discarded or replaced with the appropriate value. 
Data preparation is followed by data integration, which is combining data from various databases into one 
new database. Data from various attributes such as high school, college entrance system, national exam 
scores, achievement index for semester 1 to semester 4, index cumulative grade for semester 4, final 
cumulative index, residence, parental salary, study period, priority study program, which consists of several 
files and then put together in a single file. The data was selected according to the needs of the analyzed 
parameters, namely student achievement, study period, and drop out rates. Furthermore, the data is 
transformed through a conversion process, namely changing one data format to another data format so that it 
can be read by certain systems for the mining process. The conversion process is necessary because the 
academic data required has different units and types of data, so it needs to be converted into an equivalent 
numerical form. After conversion, data normalization is then performed using the min-max method, where 
for each input data the minimum (Xmin) and maximum (Xmax) values are sought, then the normalization 
process is carried out so that normal data is obtained [25] using (1). 


(NewMax—NewMin) 


Newdata = (data — Min) x = 
(Max—Min) 


+ NewMin (1) 


Newdata is the normalized data, min is the minimum value of the data, max is the maximum value of the 
data, newmin and newmax are the minimum and maximum limits. 
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The clustering process using SOM is carried out after all the required data has been normalized. 
Clustering is done using miniSOM which is a python library that focuses on scientific computing. 
The clustering process begins with the formation of a SOM network map based on the input data on the system 
created, then a learning process is carried out with several iterations to produce an ideal weight matrix. 
Furthermore, the ideal weight matrix is used to map the input data into groups of output data. The learning 
process is based on the distance between the input data and the weight matrix. After the initialization process, then 
proceed with the training process. The unsupervised learning algorithm on the kohonen SOM network [26] is (2). 


D; = Xi- (Wij — XD? (2) 


Where D; is the euclidean distance, W;; is the weight of the i-th neuron, X; is the i-th input vector. After getting 
the winning neurons, then updating the weight values of the winning neurons and neighboring neurons is (3). 


W, (¢ + 1) =W, Œ) + a(t) [X WAO] (3) 


Where W;; is the weight for the j-th output neuron and the i-th input neuron, æ (t) is the learning rate, and 
the neighbor function. The stages of the Kohonen SOM algorithm are in [26]: 
- Initialize weight W;; with random value, learning rate and neighbor function. 
— Select input X; randomly from the input set. 
- Calculate the degree of similarity using the eucledian distance D; (2) for all neurons (j). 
- Select the winning neuron, that is, the neuron with the minimum euclidean distance. 
- Improved the weight of the winning neuron in the Wj; (3) score and the weight of the neighboring 
neurons. 
— Update the learning rate and reduce the neighbor function linearly or exponentially. 
— Perform steps 2 to 5 until the epoch value (maximum iteration value) is reached. 
Davies-bouldin index (DBI) metric introduced by Davies and Bouldin in 1979 [27]. DBI is used to 
evaluate clusters through the process of calculating sum of square within clusters sum of squares within 
(SSW) as a cohesion metric with i-clusters. The clustering evaluation process using SSW is (4). 


1 . 
SSW =—Yintid (xj, ci) (4) 
Where m; is the number of input data that is in the i-th cluster, while c; is the i-th centroid cluster. The sum 


of square between clusters (SSB) formula is used by measuring the distance between the centroids (weight 
metrics) for example clusters i (c;), and clusters j (c;) as in (5). 


SBa (5) 
Furthermore, R;; is the comparison value between cluster i and cluster j. The value is obtained from the 


components of cohesion and separation. A good cluster must have the smallest cohesion value and the largest 
separation value as in the (6). 


= SSW j+SSW ; 
Rij = SSBij (6) 
The DBI value is obtained from the (7). 
1 max R;; 
DBI =~, ey U (7) 


The stages in the TOPSIS method are contained in [9]. The normalized decision matrix is determined as in the (8). 


Ry = -= — (8) 


Determine the weighted normalization decision matrix, with the criteria weights in Table 1. 
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Table 1. Criteria weights 


No Criteria Weight 
1 Student performance 5 
2 Study period 4 
3 Drop out 3 


Calculates a weighted normalization matrix as in the (9). 
Yi; = Wi Rij (9) 


Determine the positive ideal solution matrix and the negative ideal solution matrix as in the following (10), (11) 
The positive ideal solution (A*) is determined by: 


Ata (Veh Vous YR) (10) 
The negative ideal solution (A7) is determined by: 
AEE VOR ee) 


i+ ee Yij : if j is an attribute of profit 
minYij:if j is acost attribute 


| max Yij : if j is a cost attribute 11 
minYij : if j is an attribute of profit GD 
The distance between alternative A, and the positive ideal solution is defined as (12). 
(12) 
(13) 


Decision matrix D is used to find the preference value for each given alternative, refers to m alternatives 
that are evaluated based on the specified criteria, shows the computational performance for the i-th alternative and 
the j attribute. The closeness of each alternative to the ideal solution is calculated according to the (14). 


V = Z i =1,2,3....m (14) 


Dj +D; 


The SOM-SIS method starts after the results of clustering using SOM are known, then auto-summarizing 
about the academic-performance of universities using TOPSIS is made. SOM-SIS is useful for determining the 
level of university academic performance based on SOM results using the TOPSIS decision support system. 
Table 2 is the cluster value of each parameter, and Table 3 is the dominant cluster combination from the 
SOM results. 


Table 2. Criterion value 


Criteria Description _ Value 
Student performance Poor 1 
Fair 2 
Good 3 
Study period On time 1 
Not on time 2 
Drop out No potential 1 
Potential 2 


The results of the cluster of three academic parameters using the SOM produce a combination, the 
TOPSIS method is known as an alternative. The combination of the values of the three academic parameters 
(alternatives) produces 12 cluster channels. The SOM-SIS base rules are Figure 2. 
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Table 3. Combination of dominant cluster values 
Cluster Value 


Student performance Study period __Drop out 
1 


12 Chanel cluster 


CASK TOMMOAeS 
WKWWKWWNNNNREE 

NEPNEFNFNEFNKE NE 
=. NNE KF NNR KF NNE 


Figure 2. Rules base SOM-SIS 


Table 4 is a ranking of preference values which is the final result of the SOM-SIS method. Based on 
the combination of the dominant cluster values in Table 3, universities can find out the cluster ranking and 
academic performance by referring to Table 5. Table 5 is the result of automatic conclution, where universities 
can find out their level on the cluster channel, cluster rankings and college academic performance. 


Table 4. Preference value range 
Range Cluster rank College category 


1-4 1 Good 
5-8 2 Fair 
9-12 3 Poor 


Table 5. Autosummarizing SOM-SIS 


Preference Condition of college _ Value _ Ranking Clusterrank College academik performance 


V1 A 0.40 8 2 Fair 
V2 B 0.00 12 3 Poor 
V3 C 0.33 10 3 Poor 
V4 D 0.26 11 3 Poor 
V5 E 0.62 4 1 Good 
V6 F 0.37 9 3 Poor 
V7 G 0.53 6 2 Fair 
V8 H 0.47 7 2 Fair 
v9 I 1.00 1 1 Good 
V10 J 0.59 5 2 Fair 
V11 K 0.74 2 1 Good 
V12 L 0.67 3 1 Good 
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The validation of the SOM-SIS system was carried out to determine whether the system developed 
was in accordance with the academic quality assurance system of higher education institutions in Indonesia. 
Validation was carried out by three quality assurance experts from universities. This validation is done by 
comparing the SOM-SIS system with HEAI on the assessment matrix [26] using manual calculations and TOPSIS. 


3. RESULTS AND ANALYSIS 

A total of 300 datasets were collected from a survey of universities in Indonesia for the 2011-2013 
academic year. The parameters used for academic quality assurance consist of student performance, study 
period, and drop out rates. The measurement of student performance uses attributes: high school, university 
entrance system, school exam scores, parents’ salaries, index of cumulative grades for semester 4 and index 
of final cumulative grades. Measurement of study period using attributes: college entrance system, residence, 
parental salary, final cumulative achievement index. The measurement of the drop out rate uses the criteria 
for students who have not graduated up to 8 semesters for undergraduate studies, with attributes: high school, 
priority study programs, college admissions system, parents’ salaries, achievement index for semesters | to 4, 
number of semester credit units. 


3.1. Clustering using the self organizing map 

The results of clustering using SOM are presented in Table 6. The cluster results of the three 
academic parameters show that the “fair” cluster members are the most dominant in the student performance 
parameters, the “not on time” cluster is more dominant in the study period parameter, while the “not 
potential” cluster is more dominant in dropout rates. Based on the dominant cluster value, the criterion value 
of the three academic parameters is (2, 2, 1). 

Figure 3(a) is a distribution map of clustering results on student performance parameters which are 
visualized in colored dots. Blue color indicates student performance is in the “good” cluster, orange indicates 
“fair”, and green “poor”. In this parameter, the dominant cluster is “orange”. Figure 3(b) is a distribution map 
of clustering results for the study period parameter, the blue color indicates “on time”, while orange “is not 
on time”. In this parameter the dominant cluster is “orange”. Figure 3(c) is a distribution map of clustering 
results for the dropout parameter, the blue color indicates “not potential”, while the orange color “potential”, 
in this parameter the dominant cluster is “blue”. 


Table 6. SOM result 


Parameter Cluster Number of members Dominant cluster Criterion value 
Student performance Good 22 Fair 2 
Fair 204 
Poor 74 
Study period On time 94 Not on time 2 
Not on time 206 
Dropout No potential 167 No potential 1 
Potential 133 


3.2. Clustering evaluation 

Clustering evaluation is used to find out how precisely a data is grouped. Clustering evaluation in 
this study uses the validity test of the DBI. Table 7 shows the average DBI results are quite good with a value 
of 1.11. The DBI value for study period parameter and the drop out rate is 1.00, which is better than the 
student performance parameter of 1.34. 

After clustering is done using SOM, the next step is to integrate the clustering results into TOPSIS to 
determine the preference value of the university. The preference value obtained from the integration of the two 
methods produces the Autosummarizing SOM-SIS algorithm to determine the academic quality of the 
university automatically. Based on the results of clustering using three academic parameters, the dominant 
cluster is worth (2, 2, 1) contained in the channel cluster “H”, so the results of auto-summarizing SOM-SIS 
indicate the level of university academic performance is in “rank 2” with “fair” criteria as Table 8. 


Tabel 7. Clustering evaluation 


No Parameters QE DBI 
1 Student performance <2 1.34 
2 Study period <1 1.00 
3 Drop out <2 1.00 

Mean 1.11 
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(c) 


Figure 3. SOM distribution map parameters (a) student performance, (b) study period, and (c) drop out rate 


Table 8. Result of dominant cluster value combination 


No Criteria Discription Matrix value 12 chanel cluster Cluster rank College academik performance 
1 Student performance Fair 2 H 2 Fair 
2 Study period Not on time 2 
3 Dropout No potential 1 


3.3. Similarity to an ideal solution validation 

SOM-SIS was validated by 3 higher education quality assurance experts using the HEAI score 
matrix through manual calculations and TOPSIS. The results of the validation carried out by higher education 
quality assurance experts showed a conformity level of 100%. The conclusion is that the SOM-SIS system is 
able to provide accurate conclusions in terms of cluster rankings and college academic performance, as Table 9. 

Based on the results of clustering academic data of students at a university in Indonesia using three 
academic parameters, the SOM-SIS system can summarize them well, so it can be seen that the academic 
performance of the university is in “rank 2” with the criteria of “fair”. The SOM-SIS system can help quality 
assurance in universities to summarize the results of academic clustering and conclude the academic 
performance of universities compared to others. Knowledge of SOM-SIS results will assist university 
management in making academic decisions to improve its performance. 


Table 9. SOM-SIS validation with HEAI 


Comparison 
Method SOM-SIS result Matrix HEAI (manual score) Matrix HEAI (TOPSIS) Conformity level 
Academic performance Fair Fair Fair 100% 
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4. CONCLUSION 

In this study, SOM-SIS was able to automatically summarize the results of the clustering of 300 
datasets of universities in Indonesia. The results of clustering using three academic parameters can be well 
summarized by the SOM-SIS system, so that the academic performance of the college can be known. 
The SOM-SIS system can help higher education quality assurance to summarize the results of academic 
clustering and conclude the academic performance of universities compared to others. Knowledge of SOM-SIS 
results will assist university management in making academic decisions to improve its performance. The results 
of the validation carried out by three higher education quality assurance experts showed a 100% conformity 
level. The conclusion is that the SOM-SIS system is able to summarize the results of clustering and 
determine cluster rankings and college academic performance accurately. 

In future research, it is necessary to use larger data to support a better analysis of academic quality 
assurance. The use of data transformation methods such as one hot encoding or integer encoding can be used 
to obtain precise cluster results. Several other academic parameters such as graduate competence, acceptance in 
work, need to be added to determine the academic performance of higher education institutions for the better. 
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